Configure Harvester Settings
From Admin > Harvester Configuration you can:
On installation, default Harvester Settings and Global Harvester Rule Sets are applied.
You can modify the Harvester settings to suit the requirements of your system.
Changes will be applied only to new Harvester jobs and not to the jobs that are running.
Update Harvester Settings
Accessing the Harvester Settings
Depending on how the Sintelix Agent was installed, the Harvester Settings may be accessed locally within Sintelix, or remotely in the Sintelix Agent.
The Harvester Settings are exactly the same, no matter how they are accessed.
Local Harvester Settings
To update the local Harvester Settings:
- Select Admin > Harvester Configuration
- Change the System Harvester Settings as required (seeSystem Harvester Settings).
- Select Save Settings.
Remote Harvester Settings
If the Sintelix Agent was installed remotely, the harvester settings are accessed from within the Sintelix Agent.
- Log in to the Sintelix Agent (see Login to the Sintelix Agent)
- Select Harvester Settings in the top right of the screen
- Change the System Harvester Settings as required (seeSystem Harvester Settings).
- Select Save Settings.
System Harvester Settings
Maximum Concurrent Harvest Jobs | Enter an integer between 0 to 65 . It is recommended that the maximum number of concurrent browsers should be set to half the number of the Server’s cores on which Sintelix is running. For instance, if Sintelix is installed on a server with 8 cores, the maximum number of concurrent browsers should be set to 4. Anything above this will impact the performance. |
Maximum Workers | Set the maximum number of workers available to the system. A running harvest job is backed by workers. Workers increase the speed of harvest. You may set a limit for each harvest job to reserve capacity for multi-user environment. |
Same-domain Wait Time | Enter the number of minimum and/or maximum delay in seconds . This will add a delay between page requested to the same domain Adding a delay prevents an IP being blocked due to high traffic. |
Worker Screen Size | Type the width and height, in pixels, to define the screen size of the browser The screen size can be set to larger than the monitor. |
Proxy | Set up a proxy:
|
Extra Browser Parameters | Enter any additional parameters, such as disable web-security for Chrome and so on. |
Update Harvester Rule Sets
Harvester rule sets are periodically updated to keep up with front end changes made to websites that Harvester rule sets are designed to harvest content from. This option allows you to check and update your global default Harvester rule sets automatically.
To update your Global Harvester rule sets:
- In Sintelix, navigate to Admin > Harvester Configuration
- Scroll down to the Global Harvester rule sets section
- Under System Wide Updates, if there are any updates available, select Update Global rule sets.
If successful, a notification will appear with the message ‘No need to update your harvester rule sets’